YouTube Scale, Large Vocabulary Video Annotation
نویسندگان
چکیده
As video content on the web continues to expand, it is increasingly important to properly annotate videos for effective search and mining. While the idea of annotating static imagery with keywords is relatively well known, the idea of annotating videos with natural language keywords to enhance search is an important emerging problem with great potential to improve the quality of video search. However, leveraging web-scale video datasets for automated annotation also presents new challenges and requires methods specialized for scalability and efficiency. In this chapter we review specific, state of the art techniques for video analysis, feature extraction and classification suitable for extremely large scale automated video annotation. We also review key algorithms and data structures that make truly large scale video search possible. Drawing from these observations and insights, we present a complete method for automatically augmenting keyword annotations to videos using previous annotations for a large collection of videos. Our approach is designed explicitly to scale to YouTube sized datasets and we present some experiments and analysis for keyword augmentation quality using a corpus of over 1.2 million YouTube videos. We demonstrate how the automated annotation of webscale video collections is indeed feasible, and that an approach combining visual features with existing textual annotations yields better results than unimodal models. Nicholas Morsillo Department of Computer Science, University of Rochester, Rochester, NY 14627, e-mail: [email protected] Gideon Mann Google Research, 76 Ninth Avenue, New York, NY 10011 e-mail: [email protected] Christopher Pal Département de génie informatique et génie logiciel, École Polytechnique de Montréal, Montréal, PQ, Canada H3T 1J4, e-mail: [email protected]
منابع مشابه
YouTube-8M: A Large-Scale Video Classification Benchmark
Many recent advancements in Computer Vision are attributed to large datasets. Open-source software packages for Machine Learning and inexpensive commodity hardware have reduced the barrier of entry for exploring novel approaches at scale. It is possible to train models over millions of examples within a few days. Although large-scale datasets exist for image understanding, such as ImageNet, the...
متن کاملThe Effects of Multimedia Annotations on Iranian EFL Learners’ L2 Vocabulary Learning
In our modern technological world, Computer-Assisted Language learning (CALL) is a new realm towards learning a language in general, and learning L2 vocabulary in particular. It is assumed that the use of multimedia annotations promotes language learners’ vocabulary acquisition. Therefore, this study set out to investigate the effects of different multimedia annotations (still picture annotatio...
متن کاملAn Effective Way to Improve YouTube-8M Classification Accuracy in Google Cloud Platform
Large-scale datasets have played a significant role in progress of neural network and deep learning areas. YouTube-8M is such a benchmark dataset for general multilabel video classification. It was created from over 7 million YouTube videos (450,000 hours of video) and includes video labels from a vocabulary of 4716 classes (3.4 labels/video on average). It also comes with pre-extracted audio &...
متن کاملKodak consumer video benchmark data set : concept definition and annotation
Semantic indexing of images and videos in the consumer domain has become a very important issue for both research and actual application. In this work we developed Kodak’s consumer video benchmark data set, which includes (1) a significant number of videos from actual users, (2) a rich lexicon that accommodates consumers’ needs, and (3) the annotation of a subset of concepts over the entire vid...
متن کاملAssistive Sports Video Annotation: Modelling and Detecting Complex Events in Sports Video
Video analysis in professional sports is a relatively new assistive tool for coaching. Currently, manual annotation and analysis of video footage is the modus operandi. This is a laborious and time consuming process, which does not afford a cost effective or scalable solution as the demand and uses of video analysis grows. This paper describes a method for automatic annotation and segmentation ...
متن کامل